Contents

Data preparation

The human ionomics data set has been pre-processed. We need to get the symbolic data:

dat <- read.table("./test-data/human.csv", header = T, sep = ",")
dat <- dat[!duplicated(dat[, 1]), ]
colnames(dat)[1] <- "Line"
dat_symb <- symbol_data(x = dat, thres_symb = 3)

Some of ionomics data and symbolic data are like:

dat %>% sample_n(10) %>%
  kable(caption = 'Ionomics data', digits = 2, booktabs = T) %>% 
  kable_styling(full_width = F, font_size = 10,
                latex_options = c("striped", "scale_down"))
Table 1: Ionomics data
Line As B Ca Cd Co Cu Fe K Li Mg Mn Mo Na Ni P S Se Zn
SIRT1 0.62 -1.65 -1.77 -1.41 -1.99 0.60 -0.82 -3.86 1.17 3.23 -1.96 -0.40 -1.74 -2.18 2.08 -1.11 -0.96 1.12
JAK1 -2.63 1.75 -0.86 1.24 -0.56 -0.11 2.93 0.82 1.57 1.79 0.91 1.23 1.10 2.19 0.65 0.43 -1.61 1.50
TUSC3 1.33 -2.22 -1.63 -0.49 0.49 -2.26 2.65 4.77 3.04 4.88 -1.43 3.42 -2.40 -0.79 1.53 1.78 -1.00 2.81
HRSP12 2.27 1.50 -1.43 0.91 -0.82 -0.42 -1.14 -0.52 1.66 1.11 -0.89 0.10 -0.59 -0.30 2.53 1.52 -1.39 1.60
MRPL50 -0.02 1.12 1.27 3.79 2.25 1.89 2.61 0.50 -1.09 1.36 1.14 0.87 1.27 1.31 0.13 0.84 -1.42 0.78
AEBP2 -0.02 -0.12 -3.82 -0.84 -1.83 -0.78 -0.82 -2.55 1.59 -2.13 0.69 -3.49 -1.03 0.70 -1.20 -0.77 -0.18 0.03
PIN1L -0.02 -0.81 -1.50 -0.75 -0.87 2.45 -2.16 -1.24 0.12 -1.53 3.54 -0.40 -1.35 -0.70 -2.58 -0.80 -0.52 -2.26
WDR17 0.86 -2.46 -0.17 -3.60 -1.00 -1.45 1.40 1.89 -2.44 2.69 -0.32 -0.06 -1.06 -0.16 1.29 -0.14 -1.24 1.10
SQLE -1.24 -0.68 -1.20 -0.42 -1.30 6.73 1.75 -1.99 -0.35 0.72 -3.74 -0.81 -0.81 0.65 1.56 -1.25 0.43 4.31
EIF2B2 -0.13 -2.80 -2.47 -0.48 -1.04 1.23 -1.74 -1.07 -0.59 -2.24 -1.76 -1.01 -2.63 1.60 1.57 -1.77 1.11 -0.44

dat_symb %>% sample_n(10) %>%
  kable(caption = 'Symbolic data', booktabs = T) %>%
  kable_styling(full_width = F, font_size = 10,
                latex_options = c("striped", "scale_down"))
Table 1: Symbolic data
Line As B Ca Cd Co Cu Fe K Li Mg Mn Mo Na Ni P S Se Zn
ZNF197 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
UBE2B 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0
ZDHHC21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CDK5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 -1
ALG10 0 -1 0 0 0 0 0 0 0 0 0 0 -1 -1 0 0 0 0
SRPR 0 0 0 0 0 -1 1 0 0 0 -1 0 0 0 0 0 0 0
WDR17 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ZNF192 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 1 0 0
KARS 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0
PGLYRP2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

These data are filtered, i.e. remove all zero genes in symbolic data set:

idx <- rowSums(abs(dat_symb[, -1])) > 0
dat <- dat[idx, ]
dat_symb <- dat_symb[idx, ]
dim(dat)
#> [1] 434  19

Data clustering

The hierarchical cluster analysis is the key part of gene network and gene enrichment analysis. The methodology is as follow:

One example is:

min <- 8 
clust <- gene_clus(dat_symb[, -1], min_clust_size = min)
names(clust)
#> [1] "clus"    "idx"     "tab"     "tab_sub"
clust$tab_sub
#>   cluster nGenes
#> 1      14     11
#> 2       4     10
#> 3      24     10
#> 4      79     10

Gene network

The gene network uses both the ionomics and symbolic data. The similarity measures on ionomics data are used to construct the network. Before creating a network, these analyses are further filtered by:

The methods implemented are: pearson, spearman, kendall, cosine, mahal_cosine or hybrid_mahal_cosine.

We use the Pearson correlation as similarity measure for network analysis:

net <- GeneNetwork(data = dat,
                   data_symb = dat_symb,
                   min_clust_size = min,
                   thres_corr = 0.6,
                   method_corr = "pearson")

The network with nodes coloured by the symbolic data clustering is:

net$plot.pnet1
Network with Pearson correlation: symbolic clustering

Figure 1: Network with Pearson correlation: symbolic clustering

The same network, but nodes are coloured by the network community detection:

net$plot.pnet2
Network with Pearson correlation: community detction

Figure 2: Network with Pearson correlation: community detction

The network analysis also returns a network impact and betweenness plot:

net$plot.impact_betweenness
Network with Pearson correlation: impact and betweenness

Figure 3: Network with Pearson correlation: impact and betweenness

For comparison purposes, we use Mahalanobis Cosine:

net_2 <- GeneNetwork(data = dat,
                     data_symb = dat_symb,
                     min_clust_size = min,
                     thres_corr = 0.6,
                     method_corr = "mahal_cosine")
net_2$plot.pnet1
Network with Mahalanobis Cosine

Figure 4: Network with Mahalanobis Cosine

net_2$plot.pnet2
Network with Mahalanobis Cosine

Figure 5: Network with Mahalanobis Cosine

Again, we use Hybrid Mahalanobis Cosine:

net_3 <- GeneNetwork(data = dat,
                     data_symb = dat_symb,
                     min_clust_size = min,
                     thres_corr = 0.6,
                     method_corr = "hybrid_mahal_cosine")
net_3$plot.pnet1
Network with Hybrid Mahalanobis Cosine

Figure 6: Network with Hybrid Mahalanobis Cosine

net_3$plot.pnet2
Network with Hybrid Mahalanobis Cosine

Figure 7: Network with Hybrid Mahalanobis Cosine

Enrichment analysis

The enrichment analysis is based on symbolic data clustering. The genes in clusters are considered target gene sets while genes in the whole data set is the universal gene set.

The KEGG enrichment analysis with a p-values of 0.05:

kegg <- kegg_enrich(data = dat_symb, min_clust_size = min, pval = 0.05,
                    annot_pkg =  "org.Hs.eg.db")

#' kegg
kegg %>% 
  kable(caption = 'KEGG enrichment analysis',
        digits = 3, booktabs = T) %>%
  kable_styling(full_width = F, font_size = 10,
                latex_options = c("striped", "scale_down"))
Table 2: KEGG enrichment analysis
Cluster KEGGID Pvalue Count Size Term
Cluster 24 (10 genes) 00510 0.032 2 9 N-Glycan biosynthesis
Cluster 79 (10 genes) 00520 0.000 2 4 Amino sugar and nucleotide sugar metabolism

Note that there could be no results returned for KEGG enrichment analysis. Arguments such as min_clust_size can be changed as appropriate.

The GO Terms enrichment analysis with ontology of BP (other two are MF and CC):

go <- go_enrich(data = dat_symb, min_clust_size = min, pval = 0.05,
                ont = "BP", annot_pkg =  "org.Hs.eg.db")
#' go
go %>% head() %>% 
  kable(caption = 'GO Terms enrichment analysis',
        digits = 3, booktabs = T) %>%
  kable_styling(full_width = F, font_size = 10,
                latex_options = c("striped", "scale_down"))
Table 3: GO Terms enrichment analysis
Cluster ID Description Pvalue Count CountUniverse Ontology
Cluster 14 (11 genes) GO:0009615 response to virus 0.0132 2 8 BP
Cluster 14 (11 genes) GO:0007059 chromosome segregation 0.025 2 11 BP
Cluster 4 (10 genes) GO:0051092 positive regulation of NF-kappaB transcription factor activity 0.0023 2 3 BP
Cluster 4 (10 genes) GO:0043410 positive regulation of MAPK cascade 0.0197 2 8 BP
Cluster 4 (10 genes) GO:0051090 regulation of DNA-binding transcription factor activity 0.0197 2 8 BP
Cluster 4 (10 genes) GO:0006955 immune response 0.0327 3 27 BP

Exploratory analysis

The explanatory analysis performs PCA and correlation analysis for ions in terms of genes. Note that this analysis treats ions as samples/replicates while genes are treated as variables/features. The explanatory analysis is initially employed at an early stage of the analysis.

We apply it to the pre-processed data dat before any other analysis:

expl <- ExploratoryAnalysis(data = dat)
names(expl)
#> [1] "plot.pca"       "data.pca.load"  "plot.corr"      "plot.corr.heat"
#> [5] "plot.heat"      "plot.net"

The PCA plot is:

expl$plot.pca
Ion PCA plot on pre-processed data

Figure 8: Ion PCA plot on pre-processed data

The Person correlation of ions are shown in correlation plot, heatmap and network plot:

expl$plot.corr
Ion correlation plots on pre-processed data

Figure 9: Ion correlation plots on pre-processed data

expl$plot.corr.heat
Ion correlation plots on pre-processed data

Figure 10: Ion correlation plots on pre-processed data

expl$plot.net
Ion correlation plots on pre-processed data

Figure 11: Ion correlation plots on pre-processed data

The correlation between ions and genes are shown in heatmap with dendrogram:

expl$plot.heat
Correlation between ions and genes on pre-processed data

Figure 12: Correlation between ions and genes on pre-processed data